SIMP59: Data Selection and Visualisation VT25
7.5 credits
This lecture introduces key concepts in data analysis using RMarkdown notebooks, focusing on working with data structures such as tables, networks, and nested data. Participants will learn how to import data frames, filter rows, and select relevant columns to refine their datasets. The session will cover handling missing values and identifying outliers to ensure data quality. We will explore the dplyr package, using the pipe operator to streamline data transformations, and discuss the principles of tidy data for efficient analysis and visualization.
We will also explore how to structure data analysis around research questions and variables, ensuring a clear focus on meaningful insights. We will introduce grouping and aggregation techniques in dplyr to summarize data effectively, allowing for comparisons across different categories. Participants will also learn how to reshape data by lengthening and widening formats to better align with analytical needs. The session will cover methods for exporting cleaned and processed data frames for further use.
Figure 1: In this section of the book, you’ll learn how to import, tidy, transform, and visualize data.
datumdatasets
tables, normalizing (sql style)
structured data (structured query language)
unique identifier (id)
tidy format
networks
unnesting
web scraping, xml markup
unstructured data, text nlp
20 Spreadsheets
21 Databases
22 Arrow
23 Hierarchical data
24 Web scraping
find, generate a relevant dataset with regard to RQ?import dataset to r dataframesubsetting data, e.g. “select from x where y==1 order by z”columns, e.g. select(x, y)summarizing, grouping, aggregating etc20 Spreadsheets 21 Databases 22 Arrow 23 Hierarchical data
12 Logical vectors 13 Numbers 14 Strings 15 Regular expressions 16 Factors 17 Dates and times 18 Missing values 19 Joins
Figure 2: The column names of pivoted columns become values in a new column. The values need to be repeated once for each row of the original dataset.
Data collection (nov 12)
Exam question 1
Data analysis (nov 26)
Exam question 2
Workshop 2, dec 2
References